Database: diamonds - contains information about ~54,000 diamonds, including price, carat, colour, clarity and cut.

Transformations and Stats

Simple bar plot to start our learning

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut))

Though we have just provided R with the x variable, it is outputting a plot. How is R calculating this? It is “binning” our data and then plotting our bin counts.

You can generally use geoms and stats interchangeably. For example, you can recreate the previous plot using stat_count() instead of geom_bar().

ggplot(data = diamonds) + 
  stat_count(mapping = aes(x = cut))

#1. If I try adding y variable as freq, R will pop an error
# ggplot(data = diamonds) + 
#  stat_count(mapping = aes(x = cut, y = frequency(x)))

#2. If I try doing the same this with geom_bar(), R will still pop an error? Why?
# ggplot(data = diamonds) + 
#  geom_bar(mapping = aes(x = cut, y = freqquency(x)))
    #This is because in both the cases, the parameters allows for either x or y aesthetic to be specified. 

Can we override a default stat (which is a count or a summary) to identity (which is the raw value of a variable)? Yes.

#Example
demo <- tribble(
  ~cut,         ~freq,
  "Fair",       1610,
  "Good",       4906,
  "Very Good",  12082,
  "Premium",    13791,
  "Ideal",      21551
)
demo
## # A tibble: 5 × 2
##   cut        freq
##   <chr>     <dbl>
## 1 Fair       1610
## 2 Good       4906
## 3 Very Good 12082
## 4 Premium   13791
## 5 Ideal     21551
ggplot(data = demo) +
  geom_bar(mapping = aes(x = cut, y = freq), stat = "identity")

#You can also override a default mapping from transformed variables to aesthetics
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, y = stat(prop), group = 1))
## Warning: `stat(prop)` was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(prop)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

#> Warning: `stat(prop)` was deprecated in ggplot2 3.4.0.

Can we plot statistical details using ggplot2?

ggplot(data = diamonds) + 
  stat_summary(
    mapping = aes(x = cut, y = depth),
    fun.min = min,
    fun.max = max,
    fun = median
  ) #fun is the function that is calculated by R, here it is a median.

Aesthetic Adjustments

Three scenarios trying different aspects of aesthetics in ggplot2

#colour -> gives border colours for each of the bars in the plot
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, colour = cut))

#fill -> adds colour to the bars
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = cut))

#Here, the fill argument is assigned to another variable in the 'diamonds' dataset called 'clarity'
#Notice how the stacking is done automatically. This is done behind the scenes with a position argument.
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity)) #Looks very messy

#Try altering transparency of bars in bar plot
ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) + 
  geom_bar(alpha = 1/5, position = "identity")

#To color the bar outlines with no fill color
ggplot(data = diamonds, mapping = aes(x = cut, colour = clarity)) + 
  geom_bar(fill = NA, position = "identity")

#Let's make stacked bar plots
#position = "fill" works like stacking, but makes each set of stacked bars the same height.
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill")

#position = "dodge" places overlapping objects directly beside one another.
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")

#Using jitter for scatterplots
#position = "jitter" adds a small amount of random noise to each point to avoid overplotting when points overlap. This is useful for scatterplots but not barplots.
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy), position = "jitter")

Grammar of Graphics

ggplot(data = DATA) +
GEOM_FUNCTION(
mapping = aes(MAPPINGS),
stat = STAT,
position = POSITION
) +
FACET_FUNCTION

Database mpg -Using ggplot2 for communication

Labels

Using the labs() function

#Scenario 1: title, subtitle, caption
ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth(se = FALSE) +
  labs(
    title = "Fuel efficiency generally decreases with engine size",
    subtitle = "Two seaters (sports cars) are an exception because of their light weight",
    caption = "Data from fueleconomy.gov"
  )
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

#Scenario 2: axes labels and legend titles
ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  geom_smooth(se = FALSE) +
  labs(
    x = "Engine displacement (L)",
    y = "Highway fuel economy (mpg)",
    colour = "Car type"
  )
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Annotations

What if you want to add text to the plot directly? Here we use geom_text() to add textual labels to our plots. This works similar to geom_point() but rather than a shape geometry it adds a label.

best_in_class <- mpg %>%
  group_by(class) %>%
  filter(row_number(desc(hwy)) == 1)

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  geom_text(aes(label = model), data = best_in_class)

#To avoid the labels from overlapping, we can use the argument nudge() within the function geom_text()

Scales

We can change the default scales by tweaking the values in the scale parameters.

Observe how the readability of the graphs change with tweaks in the scale of x-axis.

#Scenario 1
ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  scale_x_continuous() +
  scale_y_continuous() +
  scale_colour_discrete()

#Scenario 2
ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  scale_x_continuous(limits = c(0, 15)) +
  scale_y_continuous() +
  scale_colour_discrete()

#Scenario 3
ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  scale_x_continuous(limits = c(0, 10)) +
  scale_y_continuous() +
  scale_colour_discrete()

#Scenario 4
ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  scale_x_continuous(limits = c(0, 8)) +
  scale_y_continuous() +
  scale_colour_discrete()

Axis ticks

What if you want to change the ticks on the axes?

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  scale_y_continuous(breaks = seq(15, 40, by = 5)) 

#Notice how the y-axis has breaks in multiples of 5, from 15 to 40.
#The funciton seq() outputs a sequence of number with the difference of a specified count.

Legends and colour schemes

base <- ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class))

base + theme(legend.position = "left")

base + theme(legend.position = "top")

base + theme(legend.position = "bottom")

base + theme(legend.position = "right") # the default

#To suppress the display of the legend altogether use
# legend.position = 'none'

How can you change the colour scales?

#Use colour palettes available on R
ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(color = drv, shape = drv)) +
  scale_colour_brewer(palette = "Set1")

#Setting colours manually
presidential %>%
  mutate(id = 33 + row_number()) %>%
  ggplot(aes(start, id, colour = party)) +
    geom_point() +
    geom_segment(aes(xend = end, yend = id)) +
    scale_colour_manual(values = c(Republican = "red", Democratic = "blue"))

Themes

You can customise the entire theme of your plot.

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth(se = FALSE) +
  theme_bw()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth(se = FALSE) +
  theme_light()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth(se = FALSE) +
  theme_classic()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth(se = FALSE) +
  theme_dark()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## You can also set all the arguments for theme() yourself.